Phonotactic Language Identification for Singing
نویسنده
چکیده
In the past decades, many successful approaches for language identification have been published. However, almost none of these approaches were developed with singing in mind. Singing has a lot of characteristics that differ from speech, such as a wider variance of fundamental frequencies and phoneme durations, vibrato, pronunciation differences, and different semantic content. We present a new phonotactic language identification system for singing based on phoneme posteriorgrams. These posteriorgrams were extracted using acoustic models trained on English speech (TIMIT) and on an unannotated English-language acapella singing dataset (DAMP). SVM models were then trained on phoneme statistics. The models are evaluated on a set of amateur singing recordings from YouTube, and, for comparison, on the OGI Multilanguage corpus. While the results on a-capella singing are somewhat worse than the ones previously obtained using i-vector extraction, this approach is easier to implement. Phoneme posteriorgrams need to be extracted for many applications, and can easily be employed for language identification using this approach. The results on singing improve significantly when the utilized acoustic models have also been trained on singing. Interestingly, the best results on the OGI speech corpus are also obtained when acoustic models trained on singing are used.
منابع مشابه
Comparing different model configurations for language identification using a phonotactic approach
In this paper different model configurations for language identification using a phonotactic approach are explored. Identification experiments were carried out on the 11-language telephone speech corpus OGI-TS, containing calls in French, English, German, Spanish, Japanese, Korean, Mandarin, Tamil, Farsi, Hindi, and Vietnamese. Phone sequences output by one or multiple phone recognizers are res...
متن کاملAn efficient phonotactic-acoustic system for language identification
This paper presents a combined two-component system for language identiication based on phonotactic and acoustic features. The phonotactic part consisting of a multilingual phone-recognizer with a double bigram-decoding architecture and a phonetic-context mapping is supported by a second part with pronunciation modeling of the recognized phone-sequence using Gaussian density models. Both parts ...
متن کاملFusion of contrastive acoustic models for parallel phonotactic spoken language identification
This paper investigates combining contrastive acoustic models for parallel phonotactic language identification systems. PRLM, a typical phonotactic system, uses a phone recogniser to extract phonotactic information from the speech data. Combining multiple PRLM systems together forms a Parallel PRLM (PPRLM) system. A standard PPRLM system utilises multiple phone recognisers trained on different ...
متن کاملAutomatic language identification using a segment-based approach
Automatic Language Identification (ALI) is the problem of automatically identifying the language of an utterance through the use of a computer. In 1977, House and Neuburg proposed an approach to ALI which focused on the phonotactic constraints of different languages. Their work suggested that simple language models could be used effectively for language identification if an accurate phonetic re...
متن کاملPhonotactic spoken language identification with limited training data
We investigate the addition of a new language, for which limited resources are available, to a phonotactic language identification system. Two classes of approaches are studied: in the first class, only existing phonetic recognizers are employed, whereas an additional phonetic recognizer in the new language is created for the second class. It is found that the number of acoustic recognizers emp...
متن کامل